1,649 research outputs found
Introducing Molly: Distributed Memory Parallelization with LLVM
Programming for distributed memory machines has always been a tedious task,
but necessary because compilers have not been sufficiently able to optimize for
such machines themselves. Molly is an extension to the LLVM compiler toolchain
that is able to distribute and reorganize workload and data if the program is
organized in statically determined loop control-flows. These are represented as
polyhedral integer-point sets that allow program transformations applied on
them. Memory distribution and layout can be declared by the programmer as
needed and the necessary asynchronous MPI communication is generated
automatically. The primary motivation is to run Lattice QCD simulations on IBM
Blue Gene/Q supercomputers, but since the implementation is not yet completed,
this paper shows the capabilities on Conway's Game of Life
Interest rate convergence in the EMS prior to European Monetary Union
In this paper we analyze the convergence of interest rates in the European Monetary System (EMS) in a framework of changing persistence. This allows us to estimate the exact date of full convergence from the data. A change in persistence means that a time series switches from stationarity to non-stationarity, or vice versa. It is often argued that due to the specific historical situation in the
EMS the interest rate differential was non-stationary before the full convergence of interest rates was achieved and stationary afterwards. Our empirical results suggest that the convergence date has been very different for Belgium, France,
the Netherlands and Italy and are in line with the conclusions one would draw from a narrative approach. We compare three different estimators for the convergence date and find that the results are quite robust. Our results therefore stress the importance of credibility for monetary policy
Lattice QCD estimate of the decay rate
We compute the hadronic matrix element relevant to the physical radiative
decay by means of lattice QCD. We use the
(maximally) twisted mass QCD action with Nf=2 light dynamical quarks and from
the computations made at four lattice spacings we were able to take the
continuum limit. The value of the mass ratio we
obtain is consistent with the experimental value, and our prediction for the
form factor is , leading to keV, which is much larger than and within reach of modern experiments.Comment: 19 pages, 4 fig
Perfrewrite -- Program Complexity Analysis via Source Code Instrumentation
ACACES 2012 summer schoolMost program profiling methods output the execution time of one specific program execution, but not its computational complexity class in terms of the big-O notation. Perfrewrite is a tool based on LLVM's Clang compiler to rewrite a program such that it tracks semantic information while the program executes and uses it to guess memory usage, communication and computational complexity. While source code instrumentation is a standard technique for profiling, using it for deriving formulas is an uncommon approach
Generalizing Hierarchical Parallelism
Since the days of OpenMP 1.0 computer hardware has become more complex,
typically by specializing compute units for coarse- and fine-grained
parallelism in incrementally deeper hierarchies of parallelism. Newer versions
of OpenMP reacted by introducing new mechanisms for querying or controlling its
individual levels, each time adding another concept such as places, teams, and
progress groups. In this paper we propose going back to the roots of OpenMP in
the form of nested parallelism for a simpler model and more flexible handling
of arbitrary deep hardware hierarchies.Comment: IWOMP'23 preprin
- …